Merged
Conversation
eb3be71 to
0d32825
Compare
0d32825 to
a3a029f
Compare
4 tasks
3045a84 to
337bccc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The main issue to support host-sharing is the co-existence of guest-pulling and host-sharing versions of the nydus snapshotter, as well as other non-nydus snapshotters.
Before starting a pod sandbox, containerd will check if the pause image is present on the machine. If it is not it will "pull it" with the help of the snapshotter. This "pull" is crucial as the nydus snapshotter will export the right Kata virtual colume to handle the pull:
guest-pullit will generate aGuestImagePullvirtual volume that will make the Kata Agent try topull_imageinside the guest (for the pause image this is just unpacking the bundle in the initrd).host-shareit will convert the OCI layers totarfslayers, and generate virtual volumes that indicate the blobs to mount into the guest.Consequently, if we skip the "pull" that happens during
ensureImageExists, we never generate this virtual volumes, and execution fails. In order to make sure we pull when we need to, containerd needs to keep a per-snapshotter map of what images it has already pulled. Here's the catch:guest-pullandhost-shareare technically the same snapshotter. (This also applies for variations of thehost-sharemode:image_block,layer_blockand each one with_verity.)The solution we will adopt is to install host-sharing as a "different" snapshotter (which we implement here) together with a patch in containerd that keeps track of what images have been pulled on a per-snapshotter basis (not on a global basis).
On top of that, we need to do some work on Kata, but most of it is already here: kata-containers/kata-containers#7837. The only notable additions are to manually start the udev daemon, as we are still using the Kata Agent as
/init, and to check that the (host-)mounted dm-verity hashes actually correspond to the layer digests. The latter will wait until we re-introduce image signature validation and attestation, as we need to get the ground truth from somewhere.Another issue we ran into while testing is that, if two layers have the same digest, the tarfs module in the nydus-snapshotter will sometimes trigger an error due to a race condition.
Once this PR is merged in, both snapshotters should be usable without having to restart the snapshotters by using:
inv nydus-snapshotter.set-mode [guest-pull|host-share]the host-share mode uses
layer_block_with_verity. Withguest-pullmode, however, there is an open issue in the upstream repo regarding snapshotter restarts: containerd/nydus-snapshotter#631. This means that, in general, when changing the snapshotter mode it is safer to purge all snapshots:Purging also proved tricky, as it is not enough to remove the contents of
/var/lib/containerd-nydus*.containerdkeeps track of snapshot metadata in its metadata DB in/var/lib/containerd/io.containerd.metadata.v1.bolt/meta.db. We cannot easily delete elements from that DB, so after removing the snapshots manually, we wait for the GC to remove the elements from the DB.